Just about everyone on the planet knows about the World Wide Web. It's the most
talked-about aspect of the Internet. With the WWW's popularity, more system users are
getting into the game by setting up their own WWW servers and home page. Sophisticated
packages now act as Web servers for many operating systems, although UNIX users have
always done it from scratch. Linux, based on UNIX, has the software necessary to provide a
Web server readily available.
You don't need fancy software to set up a Web site, only a little time and the correct
configuration information. That's what this chapter is about. The chapter looks at how you
can set up a World Wide Web server on your Linux system, whether for friends, your LAN, or
the Internet as a whole.
The major aspect of the Web that attracts users and makes it so powerful, aside from
its multimedia capabilities, is the use of hyperlinks. A hyperlink lets you move
with only one mouse click from document to document, site to site, graphic to movie, and
so on. All the instructions of the move are built into the Web code.
There are two aspects to the World Wide Web: server and client. Client software is the
best known, such as Mosaic and Netscape. However, there are many different Web client
packages available other than these two, some specifically for X or Linux.
There are three primary versions of Web server software that will run under Linux. They
are from NCSA, CERN, and Plexus. The most readily available system is from NCSA, which
also provides Mosaic. NCSA's Web system is fast and quite small, can run under inetd or as
a stand-alone daemon, and provides pretty good security. This chapter uses NCSA's Web
software, although you can easily use any of the other two packages instead (some of the
configuration information will be different, of course).
The Web server software is available via anonymous FTP or WWW from one of the three sites listed following, depending on the type of server software you want:
CERN: ftp://info.cern.ch.pub/www.bin (FTP)
NCSA: ftp.ncsa.edu (FTP)
http://boohoo.ncsa.uiuc.edu (WWW)
Plexus: ftp://autsin.bsdi.com/plexus/2.2.1/dist/Plexus.html (WWW)
The NCSA Web software is available for Linux in both compiled and source code forms.
Using the compiled version is much easier because you don't have to configure and compile
the source code for the PC and Linux platforms. The binaries are often provided compressed
and tarred, so you will have to uncompress and then extract the tar library.
Alternatively, many CD-ROMs provide the software ready-to-go. If you do obtain the
compressed form of the Web server software, follow the installation or readme files to
place the Web software in the proper location.
If you have obtained a library of source code or binaries from an FTP or BBS site, you
will probably have to untar and uncompress them first. (Check with any README files before
you do this, if there are any; otherwise, you may be doing this step for nothing.)
Usually, you proceed by creating a directory for the Web software, then changing into it
and expanding the library with a command like this:
zcat httpd_X.X_XXX.tar.Z | tar xvf -
The software is often named by the release and target platform, such as
httpd_1.5_linux.tar.Z. Use whatever name your tar file has in the above line. Installation
instructions are sometimes in a separate tar file, such as Install.tar.z, which you will
have to obtain and uncompress with the following command:
zcat Install.txt.z
Make sure you are in the target directory when you issue the commands above, though, or
you will have to move a lot of files. You can place the files anywhere, although it is
often a good idea to create a special area for the Web software that can have its
permissions controlled, such as /usr/web, /var/web, or similar name.
Once you have extracted the contents of the Web server distribution and the library
files are in their proper directories, you can look at what has been created
automatically. You should have the following subdirectories:
| cgi-bin | Common gateway interface binaries and scripts |
| conf | Configuration files |
| icons | Icons for home pages |
| src | Source code and (sometimes) executables |
| support | Support applications |
If you don't have to modify the source code and recompile it under Linux, you can skip
the configuration details mentioned in the rest of this section. On the other hand, you
may want to know what is happening in the source code anyway, because you can better
understand how Linux works with the Web server code. If you obtained a generic, untailored
version of the NCSA Web server, you will have to configure the software.
Begin by editing the src/Makefile file to specify your platform. You have to check
several variables for proper information:
| AUX_CFLAGS | Uncomment the entry for Linux (identified by comment lines and symbols, usually) |
| CC | Specify the name of the C compiler (usually cc or gcc) |
| EXTRA_LIBS | Add any extra libraries that need to be linked in (none are required for Linux) |
| LFLAGS | Add any flags you need for linking (none are required for most Linux linkers) |
Finally, look for the CFLAGS variable. Some of the values for CFLAGS may be set
already. Valid values for CFLAGS are as follows:
| DESCURE_LOGS | Prevents CGI scripts from interfering with any log files written by the server software |
| DMAXIMUM_DNS | Provides a more secure resolution system at the cost of performance |
| DMINIMAL_DNS | Doesn't allow reverse name resolution, but speeds up performance |
| DNO_PASS | Prevents multiple children from being spawned |
| DPEM_AUTH | Enables PEM/PGP authentication schemes |
| DXBITHACK | Provides a service check on the execute bit of an HTML file |
| O2 | Is an optimizing flag |
It is unlikely that you will need to change any of the flags in the CFLAGS section, but
at least you now know what they do. Once you have checked the src/Makefile for its
contents, you can compile the server software. Change into the src directory and issue the
command:
make
If you see error messages, check the configuration file carefully. The most common
problem is the wrong platform (or multiple platforms) selected in the file.
Once the Web server software has been compiled, you have to compile the support
applications, too. Change into the support directory and check the Makefile there. Once it
is correct, issue the make command again. Then, change to the cgi-src directory and repeat
the process.
Some versions of NCSA Web server software (notably releases 1.4 or later) enable you to compile all three sets of source code with the command make sgi from the Web directory.
Once the software is in the proper directories and compiled for your platform, it's
time to configure the system. Begin with the httpd.conf-dist file. This file handles the
httpd server daemon. Before you edit the file, you have to decide whether you will install
the Web server software to run as a daemon, or whether it will be started by inetd. If you
anticipate a lot of use, run the software as a daemon. For occasional use, either is
acceptable.
Several variables in httpd.conf-dist need to be checked or have values entered for
them. All the variables in the configuration file follow the following syntax:
variable value
Note that there is no equal sign or special symbol between the variable name and the
value assigned to it. For example, a few lines would look like this:
FancyIndexing on HeaderName Header ReadmeName README
Where pathnames or filenames are supplied, they are usually relative to the Web server
directory, unless explicitly declared as a full pathname. The variables you need to supply
in httpd.conf-dist are as follows:
The next configuration file to check is srm.conf, which is used to handle the server
resources. The variables that have to be checked or set in the srm.conf file are as
follows:
The third file to examine and modify is access.conf-dist, which defines the services
available to WWW browsers. Usually, everything is accessible to a browser, but you may
want to modify the file to tighten security or disable some services not supported on your
Web site. The format of the conf-dist file is different from the two configuration files
you saw above. It uses a set of sectioning directives delineated by angle brackets. The
general format of an entry is:
<Directory> ... </Directory>
Any items between the beginning and ending delimiters (<Directory> and
</Directory> respectively) are directives. It's not quite that easy because several
variations can exist in the file. The best way to customize the access.conf-dist file is
to follow these steps for a typical Web server installation:
The Limit directive controls access to your server. The valid values for the Limit
directive are:
| allow | Permits specific hostnames following the allow keyword to access the service |
| deny | Denies specific hostnames following the deny keyword from accessing the service |
| order | Specifies the order in which allow and deny directives are evaluated (usually set to deny,allow but can also be allow,deny) |
| require | Requires authentication through a user file specified in the AuthUserFile entry |
The Options directive can have several entries, all of which have a different purpose.
The default entry for Options is:
Options Indexes FollowSymLinks
The authors removed the Indexes entry from the Options directive in the first step of
the customization procedure. These entries all apply to the directory the Options field
appears in. The valid entries for the Options directive are as follows:
| All | Enables all features |
| ExecCGI | Specifies that CGI scripts can be executed in this directory |
| FollowSymLinks | Enables httpd to follow symbolic links |
| Includes | Enables include files for the server |
| IncludesNoExec | Enables include files for the server but disables the exec option |
| Indexes | Enables users to retrieve indexes (doesn't affect precompiled indexes) |
| None | No features are enabled |
| SymLinksIfOwnerMatch | Follows symbolic links only if the user ID matches |
The AllowOverride variable is set to All by default, and you should change this
setting. There are several valid values for AllowOverride, but the recommended setting for
most Linux systems is None. The valid values for AllowOverride are as follows:
After you have done all that, your configuration files should be properly set. Although
the syntax is a little confusing, reading the default values will show you the proper
format to use when changing entries. Next, you can start the Web server software.
Begin by copying all your *.conf-dist files (modified in the previous section) to
*.conf (a change in the extension only). Copy the files instead of renaming them so that
you have the original .conf-dist file for future modifications. The server looks for files
with the .conf extension and will ignore .conf-dist files.
When your configuration is complete, it's time to try out the Web server software. In
the configuration files, you made a decision as to whether the Web software will run as a
daemon (stand-alone) or be started from inetd. The startup procedure is a little different
for each method (as you would expect), but both startup procedures can use one of the
following three options on the command line:
If you are using inetd to start your Web server software, you need to make a change to
the /etc/services file to enable the Web software. Add a line like this to the
/etc/services file:
http port/tcp
In this line, port is the port number used by your Web server software (usually 80).
Next, modify the /etc/inetd.conf file to include the startup commands for the Web
server:
httpd stream tcp nowait nobody /usr/web/httpd
The last entry is the path to the httpd binary. Once this is done, restart inetd by
killing the inetd process or by rebooting your system, and the service should be available
through whatever port you specified in /etc/services.
If you are running the Web server software as a daemon, you can start it at any time
from the command line with the following command:
httpd
Even better, add the startup commands to the proper rc startup files. The entry usually
looks like this:
# start httpd if [ -x /usr/web/httpd ] then /usr/web/httpd fi
You should substitute the proper paths for the httpd binary, of course. Rebooting your
machine should start the Web server software on the default port number.
To test the Web server software, use any Web browser and issue a command in the URL
field like this:
http://machinename
Replace machinename with the name of your Web server. If you see the contents of the
root Web directory or the index.html file, all is well. Otherwise, check the log files and
configuration files for clues as to the problem.
If you haven't loaded a Web browser yet, you can still check whether the Web server is
running by using telnet. Issue a command like this:
telnet www.wizard.tpci.com 80
Substitute the name of your server (and your Web port number if different than 80). You
should get a message similar to this if the Web server is responding properly:
Connected to wizard.tpci.com Escape character is '^]'. HEAD/HTTP/1.0 HTTP/1.0 200 OK
You should also get some more lines showing details about the date and content. You may
not be able to access anything, but this shows that the Web software is responding
properly.
Having a server with nothing for content is useless, so you need to set up the
information you will share through your Web system. This begins with Uniform Resource
Locators (URLs), which are search paths for data files. Anyone using your service only has
to know the URL. You don't need to have anything fancy. If you don't have a special home
page, anyone connecting to your system will get the contents of the Web root directory's
index.html file, or failing that, a directory listing of the Web root directory. That's
pretty boring, though, and most users want fancy home pages. To write a home page, you
need to use HTML (HyperText Markup Language).
A home page is like a main menu. Many users may not ever see it because they can enter
any of the subdirectories on your system or obtain files from another Web system through a
hyperlink, without ever seeing your home page. Many users, however, want to start at the
top, and that's where your home page comes in. A home page file is usually called
index.html (or home.html if an index file exists). It usually is at the top of your Web
source directories.
Writing an HTML document is not too difficult. The language uses a set of tags to
indicate how the text is to be treated (such as headlines, body text, figures, and so on).
The tricky part of HTML is getting the tags in the right place, without extra material on
a line. HTML is rather strict about its syntax, so errors must be avoided to prevent
problems.
In the early days of the Web, all documents were written with simple text editors. As
the Web expanded, dedicated Web editors that understand HTML and the use of tags began to
appear. Their popularity has driven developers to produce dozens of editors, filters, and
utilities, all aimed at making a Web documenter's life easier (and ensure that the HTML
language is properly used). HTML editors are available for many operating systems.
You can write HTML documents in many ways: you can use an ASCII editor, a word
processor, or a dedicated HTML tool. The choice of which you use depends on personal
preference and your confidence in HTML coding, as well as which tools you can obtain
easily. Because many HTML-specific tools have checking routines or filters to verify that
your documents are correctly laid out and formatted, they can be appealing. They also tend
to be more friendly than non-HTML editors. On the other hand, if you are a veteran
programmer or writer, you may want to stick with your favorite editor and use a filter or
syntax checker afterwards.
One of the best sites to look for new editors and filters is CERN. Connect to
http://info.cern.ch/WWW/Tools and check the document Overview.html. Also check the NCSA
site, accessible at http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs where the document
faq-software.html contains an up-to-date list of offerings.
You can use any ASCII editor to write HTML pages, including simple line-oriented
editors based on vi or Emacs. They all enable you to enter tags into a page of text, but
the tags are treated as words with no special meaning. There is no validity checking
performed by simple editors, as they simply don't understand HTML. There are some
extensions for Emacs and similar full-screen editors that provide a simple template check,
but they are not rigorous in enforcing HTML styles.
If you want to use a plain editor, you should carefully check your document for valid
use of tags. One of the easiest methods of checking a document is to import it into an
HTML editor that has strong type checking. Another easy method is to simply call up the
document on your Web browser and carefully study its appearance.
You can obtain a dedicated HTML authoring package from some sites, although they are
not as common for Linux as for DOS and Windows. If you are running both operating systems,
you can always develop your HTML documents in Windows, then import them to Linux. Several
popular HTML tools for Windows are available, such as HTML Assistant, HTMLed, and
HoTMetaL. A few of the WYSIWYG editors are also available for X, and hence run under
Linux, such as HoTMetaL. Some HTML authoring tools are fully WYSIWYG, and others are
character-based. Most offer strong verification systems for generated HTML code.
For the latest Linux or Windows version of HoTMetaL, try the Web site:
ftp://ftp.ncsa.uiuc.edu/Web/html/hotmetal.
An alternative to using a dedicated editor for HTML documents is to enhance an existing
WYSIWYG word processor to handle HTML properly. The most commonly targeted word processors
for these extensions are Word for Windows, WordPerfect, and Word for DOS. Several
extension products are available, of varying degrees of complexity. Most run under
Windows, although a few have been ported to Linux.
The advantage to using one of these extensions is that you retain a familiar editor and
make use of the near-WYSIWYG features it can provide for HTML documents. Although it can't
show you the final document in Web format, it can be close enough to prevent all but the
most minor problems.
CU_HTML is a template for Microsoft's Word for Windows that gives a almost WYSIWYG view
of HTML documents. CU_HTML is a template, meaning that it adds its own DLLs to Word to
enhance the system. Graphically, it looks much the same as Word, but with a new toolbar
and pull-down menu item. CU_HTML provides a number of different styles and a toolbar of
often-used tasks. Tasks like linking documents are easy, as are most tasks that tend to
worry new HTML document writers. Dialog boxes are used for many tasks, simplifying the
interface considerably.
The only major disadvantage to CU_HTML is that it can't be used to edit existing HTML
documents because they are not in Word format. When CU_HTML creates an HTML document, two
versions are produced, one in HTML and the other as a Word .DOC file. Without both, the
document can't be edited. An existing document can be imported, but it loses all the tags.
Like CU_HTML, ANT_HTML is an extension to Word. ANT_HTML has some advantages and
disadvantages over CU_HTML. The documentation and help are better with ANT_HTML, and the
toolbar is much better. There's also automatic insertion of opening and closing tags as
needed.
However, ANT_HTML requires that any inline GIF images be inserted instead of using a
DLL. This means that you may have to hunt for a suitable filter. Also, like CU_HTML,
ANT_HTML can't handle documents that were not produced with ANT_HTML.
One system that has gained popularity among Linux users is tkWWW. A tool for the Tcl
language and its Tk extension for X, tkWWW is a combination of a Web browser and a
near-WYSIWYG HTML editor. Although originally UNIX-based, tkWWW has been ported to several
other platforms, including Windows and Macintosh.
tkWWW can be obtained through anonymous ftp to ftp.aud.alcatel.com in the directory /pub/tcl/extensions. Copies of Tcl and Tk can be found in several sites depending on the platform required, although most versions of Linux have Tcl and Tk included in the distribution set. As a starting point, try anonymous FTP to ftp.cs.berkeley.edu in the directory /ucb/tcl.
When you create a Web page with tkWWW in editor mode, you can then flip modes to
browser to see the same page properly formatted. In editor mode, most of the formatting is
correct, but the tags are left visible. This makes for fast development of a Web page.
Unfortunately, tkWWW must rely on Tk for its windowing, which tends to slow things down
a bit on average processors. Also, the browser aspect of tkWWW is not impressive, using
standard Tk frames. However, as a prototyping tool, tkWWW is very attractive, especially
if you know the Tcl language.
Another option is to use an HTML filter. An HTML filter is a tool that lets you take a
document produced with any kind of editor (including ASCII text editors) and convert the
document to HTML. Filters are useful when you work in an editor that has its own
proprietary format, such as Word or nroff.
HTML filters are attractive if you want to continue working in your favorite editor and
simply want a utility to convert your document with tags to HTML. Filters tend to be fast
and easy to work with because they take a filename as input and generate an HTML output
file. The degree of error checking and reporting varies with the tool.
Filters are available for most types of documents, many of which are available directly
for Linux, or as source code that can be recompiled without modification under Linux. Word
for Windows and Word for DOS documents can be converted to HTML with the CU_HTML and
ANT_HTML extensions mentioned earlier. A few stand-alone conversion utilities have also
begun to appear. The utility WPTOHTML converts WordPerfect documents to HTML. WPTOHTML is
a set of macros for WordPerfect versions 5.1, 5.2, and 6.0. The WordPerfect filter can
also be used with other word processor formats that WordPerfect can import.
FrameMaker and FrameBuilder documents can be converted to HTML format with the tool
FM2HTML. FM2HTML is a set of scripts that converts Frame documents to HTML while
preserving hypertext links and tables. It also handles GIF files without a problem.
Because Frame documents are platform-independent, Frame documents developed on a PC or
Macintosh could be moved to a Linux platform and FM2HTML executed there.
A copy of FM2HTML is available by anonymous FTP from bang.nta.no in the directory /pub. The UNIX set is called fm2-html.tar.v.0.n.m.Z.
LaTex and TeX files can be converted to HTML with several different utilities. Quite a
few Linux-based utilities are available, including LATEXTOHTML, which can even handle
in-line LaTeX equations and links. For simpler documents, the utility VULCANIZE is faster
but can't handle mathematical equations. Both LATEXTOHTML and VULCANIZE are Perl scripts.
LATEXTOHTML is available through anonymous FTP from ftp.tex.ac.uk in the directory pub/archive/support as the file latextohtml. VULCANIZE can be obtained from the Web site http://www.cis.upenn.edu in the directory mjd as the file vulcanize.html.
RTFTOHTML is a common utility for converting RTF format documents to HTML. Many word
processors handle RTF formats, so you can save an RTF document from your favorite word
processor and then run RTFTOHTML against it.
RTFTOHTML is available through anonymous FTP from ftp.cray.com in the directory src/WWWstuff/RTF. Through the Web, try http://info.cern.ch/hypertext/WWW/Tools and look for the file rtftoftml-2.6.html (or a later version).
Once you have written a Web document and it is available to the world, your job doesn't
end. Unless your document is a simple text file, you will have links to other documents or
Web servers embedded. You must verify these links at regular intervals. Also, the
integrity of your Web pages should be checked at intervals, to ensure that the flow of the
document from your home page is correct.
Several utilities are available to help you check links and to scan the Web for other
sites or documents you may want to provide a hyperlink to. These utilities tend to go by a
number of names, such as robot, spider, or wanderer. They are all programs that moves
across the Web automatically, creating a list of Web links that you can access. (Spiders
are similar to the Archie and Veronica tools for the Internet, although neither of these
cover the Web.)
Although they are often though of as utilities for users only (to get a list of sites
to try), spiders and their kin are useful for document authors, too, as they show
potentially useful and interesting links. One of the best known spiders is the World Wide
Web Worm, or WWWW. WWWW enables you to search for keywords or create a Boolean search, and
it can cover titles, documents, and several other search types (including a search of all
known HTML pages).
A similarly useful spider is WebCrawler, which is similar to WWWW except that it can
scan entire documents for matches of any keywords and display the result in an ordered
list from closest match to least match.
A copy of World Wide Web Worm can be obtained from http://www.cs.colorado.edu/home/mcbryan/WWWW.html. WebCrawler is available from http://www.biotech.washington.edu/WebCrawler/WebCrawler.html.
A common problem with HTML documents as they age is that links that point to files or
servers may no longer exist (either because the locations or documents have changed).
Therefore, it is good practice to validate the hyperlinks in a document on a regular
basis. A popular hyperlink analyzer is HTML_ANALYZER. It examines each hyperlink and the
contents of the hyperlink to ensure that they are consistent. HTML_ANALYZER functions by
examining a document to all links, then creating a text file that has a list of the links
in it. HTML_ANALYZER uses the text files to compare the actual link content to what it
should be.
HTML_ANALYZER actually does three tests: it validates the availability of the documents
pointed to by hyperlinks (called validation); it looks for hyperlink contents that occur
in the database but are not themselves hyperlinks (called completeness); and it looks for
a one-to-one relation between hyperlinks and the contents of the hyperlink (called
consistency). Any deviations are listed for the user.
HTML_ANALYZER users should have a good familiarity with HTML, their operating system,
and the use of command-line driven analyzers. The tool must be compiled using the
"make" utility prior to execution. There are several directories that must be
created prior to running HTML_ANALYZER, and it creates several temporary files when it
runs that are not cleaned up, so this is not a good utility for a novice.
Setting up your home page requires you to either use an HTML authoring tool or write HTML code directly into an editor. The HTML language is beyond the scope of this book, but you should find several good guides to HTML at your bookstore. HTML is rather easy to learn. With the information in this chapter, you should be able to set up your Web site to enable anyone on the Internet to connect to you. Enjoy the Web!